WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks

نویسندگان

  • Javier Artiles
  • Andrew Borthwick
  • Julio Gonzalo
  • Satoshi Sekine
  • Enrique Amigó
چکیده

The third WePS (Web People Search) Evaluation campaign took place in 2009-2010 and attracted the participation of 13 research groups from Europe, Asia and North America. Given the top web search results for a person name, two tasks were addressed: a clustering task, which consists of grouping together web pages referring to the same person, and an extraction task, which consists of extracting salient attributes for each of the persons sharing the same name. Continuing the path of previous campaigns, this third evaluation aimed at merging both problems into one single task, where the system must return both the documents and the attributes for each of the different people sharing a given name. This is not a trivial step from the point of view of evaluation: a system may correctly extract attribute profiles from different URLs but then incorrectly merge profiles. This campaign also featured a larger testbed and the participation of a state-of-the-art commercial WePS system in the attribute extraction task. This paper presents the definition, resources, evaluation methodology and results for the clustering and attribute extraction tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task

The second WePS (Web People Search) Evaluation campaign took place in 2008-2009 with the participation of 19 research groups from Europe, Asia and North America. Given the output of a Web Search Engine for a (usually ambiguous) person name as query, two tasks were addressed: a clustering task, which consists of grouping together web pages referring to the same person, and an extraction task, wh...

متن کامل

CASIANED: People Attribute Extraction based on Information Extraction

In this paper, we describe the people attribute extraction system of the CASIANED team for the second Web People search evaluation (WePS-2). We develop an attribute extraction system based on information extraction. Firstly the attribute candidates for every attribute class are extracted using several different information extraction techniques; then these candidates are verified through classi...

متن کامل

TALP at WePS-3 2010

In this paper we present our system and experiments at the Third Web People Search Workshop (WePS-3) task for clustering web people search documents in English. In our experiments we used a simple approach with three algorithms: Lingo, Hierachical Agglomerative Clustering (HAC), and a 2-step HAC algorithm. We also present the results and initial conclusions in the context of the WePS-3 Task 1 f...

متن کامل

Combining Evaluation Metrics with a Unanimous Improvement Ratio and its Application to the Web People Search Clustering Task

This paper presents the Unanimous Improvement Ratio (UIR), a measure that allows to compare systems using two evaluation metrics without dependencies on relative metric weights. For clustering tasks, this kind of measure becomes necessary given the trade-off between precision and recall oriented metrics (e.g. Purity and Inverse Purity) which usually depends on a clustering threshold parameter s...

متن کامل

Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results∗

People name search often returns a lot of Web pages containing the strings of personal names. Due to namesake, extracting target person attributes (such as birthday, occupation, affiliation, nationality, contact information, etc.) is expected to be helpful to differentiate documents related to different people and thus group documents related to the same person. This paper presents the methodol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010